Skip to content

API & BUG: allow list-like y argument to df.plot & fix integer arg to x,y #20000

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 22, 2018
Merged

API & BUG: allow list-like y argument to df.plot & fix integer arg to x,y #20000

merged 7 commits into from
Mar 22, 2018

Conversation

masongallo
Copy link
Contributor

@TomAugspurger

I added support for the user to pass a list-like to y, as discussed in #19699. The API to df.plot is relatively complex with lots of args, so lmk with questions / fdbck. Not sure if I covered everything for docs.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good thanks. Just a couple edge cases that need to be tested I think (your code should handle them already. Just for future regressions).

def test_y_listlike(self, y, lbl):
# GH 19699
df = DataFrame({"A": [1, 2], 'B': [3, 4], 'C': [5, 6]})
_check_plot_works(df.plot, x='A', y=y, label=lbl)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you get the ax from df.plot and assert that it has two lines? I think len(ax.lines) should work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also check the color. What happens? Ideally it's the same as df.set_index('x').plot(), so two different colors.

# GH 19699
df = DataFrame({"A": [1, 2], 'B': [3, 4], 'C': [5, 6]})
_check_plot_works(df.plot, x='A', y=y, label=lbl)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add tests for

  1. all integer columns: x=0, y=[1, 2]
  2. Mix of int and named columns. x=0, y=['A', 2]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mix of int and named columns. x=0, y=['A', 2]

Good point. Shouldn't this raise tho? If you try this on a DataFrame you'll get a KeyError. IMO we shouldn't allow users to specify a mix of int & named cols since it's unclear what you actually want.

else:
match = is_list_like(label_kw) and len(label_kw) == len(y)
if label_kw and not match:
raise ValueError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test that raises this assertion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codecov
Copy link

codecov bot commented Mar 5, 2018

Codecov Report

Merging #20000 into master will increase coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20000      +/-   ##
==========================================
+ Coverage   91.77%    91.8%   +0.03%     
==========================================
  Files         152      152              
  Lines       49205    49222      +17     
==========================================
+ Hits        45159    45190      +31     
+ Misses       4046     4032      -14
Flag Coverage Δ
#multiple 90.19% <100%> (+0.03%) ⬆️
#single 41.84% <0%> (-0.02%) ⬇️
Impacted Files Coverage Δ
pandas/plotting/_core.py 82.5% <100%> (+0.23%) ⬆️
pandas/core/window.py 96.26% <0%> (-0.01%) ⬇️
pandas/io/json/normalize.py 96.93% <0%> (+0.06%) ⬆️
pandas/util/testing.py 84.11% <0%> (+0.16%) ⬆️
pandas/plotting/_converter.py 66.81% <0%> (+1.73%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7273ea0...37a6eca. Read the comment docs.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 6, 2018 via email

@gfyoung gfyoung added Visualization plotting Regression Functionality that used to work in a prior pandas version labels Mar 8, 2018
@masongallo masongallo changed the title API: allow list-like y argument to df.plot API & BUG: allow list-like y argument to df.plot & fix integer arg to x,y Mar 8, 2018
@masongallo
Copy link
Contributor Author

I addressed #20056 and added test cases to cover it. Also added a bunch more test cases to increase coverage as requested.

@masongallo
Copy link
Contributor Author

Any ideas on the failure on CircleCI? I'm on a mac so it looks like the test gets skipped locally?

@@ -965,6 +965,8 @@ Plotting
^^^^^^^^

- :func:`DataFrame.plot` now raises a ``ValueError`` when the ``x`` or ``y`` argument is improperly formed (:issue:`18671`)
- :func:`DataFrame.plot` now supports multiple columns to the ``y`` argument (:issue:`19699`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this first should be on Other Enhancements section

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this to other enhancements?

@@ -1706,21 +1706,37 @@ def _plot(data, x=None, y=None, subplots=False,
plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds)
else:
if isinstance(data, ABCDataFrame):
new_data = data.copy() # don't modify until necessary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not necessary, just use 'data' don't add something else here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the copy to fix the bug from #20056 so we get correct integer indexing - what did you have in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's best to convert from integer to labels as soon as possible. Can you check if you can do it at the very start of _plot? Then you don't have to worry about this.

])
def test_xy_args_integer(self, x, y, colnames):
# GH 20056
df = DataFrame({"A": [1, 2], 'B': [3, 4]})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do you think this should be allowed? having both labels and positions is so very confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's confusing and definitely don't think it should be allowed. Let's remove support for positions?

@TomAugspurger
Copy link
Contributor

Any ideas on the failure on CircleCI? I'm on a mac so it looks like the test gets skipped locally?

Merge master into your branch and repush. The'll be fixed.

@@ -965,6 +965,8 @@ Plotting
^^^^^^^^

- :func:`DataFrame.plot` now raises a ``ValueError`` when the ``x`` or ``y`` argument is improperly formed (:issue:`18671`)
- :func:`DataFrame.plot` now supports multiple columns to the ``y`` argument (:issue:`19699`)
- Bug in :func:`DataFrame.plot` with ``x`` or ``y`` arguments as positions (:issue:`20056`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you describe the issue a bit more?

@@ -965,6 +965,8 @@ Plotting
^^^^^^^^

- :func:`DataFrame.plot` now raises a ``ValueError`` when the ``x`` or ``y`` argument is improperly formed (:issue:`18671`)
- :func:`DataFrame.plot` now supports multiple columns to the ``y`` argument (:issue:`19699`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this to other enhancements?

@@ -1706,21 +1706,37 @@ def _plot(data, x=None, y=None, subplots=False,
plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds)
else:
if isinstance(data, ABCDataFrame):
new_data = data.copy() # don't modify until necessary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's best to convert from integer to labels as soon as possible. Can you check if you can do it at the very start of _plot? Then you don't have to worry about this.

@masongallo
Copy link
Contributor Author

@TomAugspurger good suggestion! I think it's doable depending on where we land with the API discussion -

"y must be a label or position or list of them"
)
label_kw = kwds['label'] if 'label' in kwds else False
new_data = data[y].copy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just call this data

label_kw = kwds['label'] if 'label' in kwds else False
new_data = data[y].copy()

if isinstance(data[y], ABCSeries):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just test if y is_scalar

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate column names may mess that up, not sure if we allow that here though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate column names may mess that up

right, that's why I have the check for ABCSeries

(0, [1, 2], ['bokeh', 'cython'], ['green', 'yellow'])
])
def test_y_listlike(self, x, y, lbl, colors):
# GH 19699
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give a 1-liner expln

(1, 0, [0, 1])
])
def test_xy_args_integer(self, x, y, colnames):
# GH 20056
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

else:
match = is_list_like(label_kw) and len(label_kw) == len(y)
if label_kw and not match:
raise ValueError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this test?

int_y = is_integer(y) or all(is_integer(c) for c in y)
if int_y and not data.columns.holds_integer():
y = data_cols[y]
elif not isinstance(data[y], (ABCSeries, ABCDataFrame)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the test for this? (as you expanded to both Series & DataFrame)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, removing

int_y = is_integer(y) or all(is_integer(c) for c in y)
if int_y and not data.columns.holds_integer():
y = data_cols[y]
elif not isinstance(data[y], (ABCSeries, ABCDataFrame)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you actually selecting data[y] here what else could data[y] be?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this can probably be removed / replaced?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, will remove

label = kwds['label'] if 'label' in kwds else y
series = data[y].copy() # Don't modify
series.name = label
int_y = is_integer(y) or all(is_integer(c) for c in y)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fragile. For better or worse (mostly worse) you can pretty much any object in pandas columns, e.g. df = pd.DataFrame({pd.Timestamp('2017'): [1, 2]}).

I think your code will fail for y=pd.Timestamp('2017'), since it's not an integer, but it also isn't iterable.

I'd recommend adding is_list_like(y) before the all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can pretty much any object in pandas columns, e.g. df = pd.DataFrame({pd.Timestamp('2017'): [1, 2]})

good call

int_y = is_integer(y) or all(is_integer(c) for c in y)
if int_y and not data.columns.holds_integer():
y = data_cols[y]
elif not isinstance(data[y], (ABCSeries, ABCDataFrame)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this can probably be removed / replaced?

label_kw = kwds['label'] if 'label' in kwds else False
new_data = data[y].copy()

if isinstance(data[y], ABCSeries):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate column names may mess that up, not sure if we allow that here though.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. May have a closer look tomorrow, but otherwise +1

@masongallo
Copy link
Contributor Author

ping, tests green

@TomAugspurger
Copy link
Contributor

Thanks @masongallo!

@TomAugspurger TomAugspurger merged commit 02477da into pandas-dev:master Mar 22, 2018
javadnoorb pushed a commit to javadnoorb/pandas that referenced this pull request Mar 29, 2018
… x,y (pandas-dev#20000)

* Add support for list-like y argument

* update whatsnew

* add doc change for y

* Add test cases and fix position args

* don't copy save cols ahead of time and update whatsnew

* address fdbck
dworvos pushed a commit to dworvos/pandas that referenced this pull request Apr 2, 2018
… x,y (pandas-dev#20000)

* Add support for list-like y argument

* update whatsnew

* add doc change for y

* Add test cases and fix position args

* don't copy save cols ahead of time and update whatsnew

* address fdbck
kornilova203 pushed a commit to kornilova203/pandas that referenced this pull request Apr 23, 2018
… x,y (pandas-dev#20000)

* Add support for list-like y argument

* update whatsnew

* add doc change for y

* Add test cases and fix position args

* don't copy save cols ahead of time and update whatsnew

* address fdbck
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Regression Functionality that used to work in a prior pandas version Visualization plotting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow list-like for y in DataFrame.plot.
4 participants